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1.    INTRODUCTION 

In  this  paper  we  consider  a  game  in  which  a  single  long-run  player  faces 
a  sequence  of  short-run  opponents,  each  of  whom  plays  only  once,  but  is 
informed  of  the  outcomes  of  play  in  each  previous  period.   These  outcomes  may 
not  reveal  the  long-run  player's  past  choices,  either  because  the  long-run 
player's  action  is  subject  to  moral  hazard,  or  because  the  long-run  player  has 
chosen  to  play  a  mixed  strategy:   In  either  case,  the  observed  outcomes  give 
only  imperfect,  probabilistic  information  about  the  long-run  player's  choices. 
We  further  assume  that  the  short-run  players  are  uncertain  of  the  long-run 
player's  payoff  function,  and  model  this  uncertainty  by  with  a  probability 
distribution  over  the  "types"  of  the  long-run  player.  We  focus  on  "commitment 
types"  who  play  the  same  stage-game  strategy  in  every  period  of  play.   Our 
main  result  is  that  the  long-run  player's  payoff  in  any  Nash  equilibrium  is 
bounded  below  by  an  amount  that  converges,  as  the  discount  factor  tends  to 
one,  to  the  most  he  could  get  by  commiting  himself  to  any  of  the  strategies 
for  which  the  corresponding  commitment  type  has  positive  probability.   A  loose 
way  of  saying  this  is  that  the  long-run  player  can  obtain  a  reputation  for 
always  playing  any  strategy  which  the  short-run  players  believe  has  positive 
probability  of  always  being  played.   Note  that  this  reputation,  and  the 
corresponding  lower  bound  on  the  long-run  player's  payoff,  depend  only  on  the 
type  that  the  long-run  player  prefers  to  mimic  and  is  independent  of  the  other 
types  that  have  positive  probability.    In  Fudenberg-Levine  [1987]  we  proved  a 
similar  but  more  restrictive  theorem.   There,  we  assumed  that  the  short-run 
players  obser\'e  the  actions  that  the  long-run  player  has  chosen,  and  also 
restricted  attention  to  reputations  for  playing  pure  strategies.   Under  these 
assumptions,  if  the  long-run  player  fails  to  play  a  strategy  in  any  period  the 
short-run  players  are  certain  to  learn  that  he  is  not  the  corresponding 
commitment  type.   Conversely,  if  the  long-run  player  plays  strategy  s  in  a 
period  where  the  short-run  players  do  not  expect  him  to  do  so,  the  short-run 


players  are  certain  to  be  "surprised."   When  the  short-run  players  do  not 
directly  observe  the  long-run  player's  choice  of  action  in  the  stage  game,  or 
when  coramitment  strategies  are  mixed  instead  of  pure,  the  short-run  players 
are  not  certain  to  detect  deviations  and  our  previous  analysis  does  not  apply. 

One  implication  of  our  results  is  that  the  long-run  player  can  build  a 
reputation  for  playing  any  mixed  strategy  for  which  the  short- run  players 
assign  positive  prior  probability  to  the  corresponding  type.   The  case  of 
mixed  strategies  in  games  without  moral  hazard  is  of  particular  interest  in 
light  of  the  results  of  Fudenberg-Kreps -Maskin  [1987]  on  repeated  games  with 
long-run   and  short-run  players.   Fudenberg-Kreps-Maskin  showed  that  that  the 
pure  -  strategy  commitment  payoff  is  not  always  a  tight  lower  bound  on  what  the 
long-run  player  can  obtain  in  any  equilibrium,  because  in  some  games  by 
playing  a  mixed  strategy  the  long-run  player  can  induce  the  short-run  players 
to  choose  a  more  favorable  response.   However,  when  the  long-run  player's 
payoff  function  is  common  knowledge,  in  general  he  cannot  do  as  well  as  he 
could  by  committing  him.self  to  a  mixed  strategy.   Our  results  here  show  if  the 
corresponding  commitment  types  have  positive  prior  probability  the  the 
long-run  player  can  in  fact  build  a  reputation  for  playing  a  mixed  strategy, 
and  thus  attain  a  higher  payoff  than  in  any  equilibria  of  the  unperturbed 
game . 

We  prove  our  result  as  follows:   Let  7,  denote  the  distribution  over 
outcomes  that  corresponds  to  strategy  a, ,  and  imagine  that  the  short-run 
players  assign  positive  prior  probability  to  the  long-run  player  being  a  type 
that  always  plays  a,.   Since  the  short-run  players  are  myopic,  they  w^ill  play 
a  best  response  to  7.,  in  any  period  where  they  expect  the  distribution  over 
outcomes  to  be  close  to  7..  .  Now  imagine  that  the  long-run  player  chooses  to 
always  play  a,.      In  any  period  where  the  short-run  players  do  not  play  a  best 
response  to  7^,  there  is  a  non-negligible  probability  that  they  will  revise 
their  posterior  beliefs  a  non-negligible  amount  in  the  direction  of  the 


long-run  player  being  a  type  who  always  plays  a..       Intuitively,  if  the  short- 
run  players  do  not  play  a  best  response  to  a.  ,    they  will,  with  some 
probability,  be  "surprised"  when  a,     is  played. 

After  sufficiently  many  of  these  surprises,  the  short-run  players  will 
attach  a  very  high  probability  to  the  long-run  player  playing  a,  for  the  rest 
of  the  game,  and  thus  will  play  best  responses  to  7,  from  then  on.   Thus  one 
would  expect  that  for  any  e  there  is  an  K(£)  such  that  with  probability  (1-e) 
the  short-run  players  play  best  responses  to  7,   in  all  but  Y.(t)    periods.  The 
key  to  our  paper  is  finding  an  upper  bound  on  this  K(e)  that  holds  uniformly 
over  all  equilibria  and  all  discount  factors.   To  do  this  we  view  the 
likelihood  ratio  corresponding  to  the  short-run  player's  beliefs  about  the 
long-run  player's  type  as  a  positive  supermartingale .   Excluding  periods  where 
the  short-run  players  play  a  best  response  to  a,  ,  this  supermartingale  is 
"active"  in  the  sense  that  in  each  period  where  the  martingale's  value  is 
positive,  here  is  a  non-negligible  probability  of  a  non-negligible  jump.  Using 
theorems  about  uniform  bounds  on  upcrossing  numbers  for  martingales,  we  derive 
uniform  bounds  on  the  rate  that  active  supermartingales  converge  to  zero. 

To  allow  the  long-run  player  to  build  a  reputation  for  playing  a  mixed 
strategy,  we  initially  assume  there  is  a  positive  prior  probability  that  the 
long-run  player  is  a  type  who  will  always  use  that  mixed  strategy.   Since 
there  are  a  continuum  of  mixed  strategies  for  the  stage  game,  in  the  context 
of  reputations  for  m.ixed  strategies  it  may  seem  more  natural  to  consider 
models  with  a  continuum  of  commitment  types  and  a  (continuous)  prior 
distribution,  so  that  any  particular  commitment  type  has  prior  probability 
zero.   In  the  concluding  section  of  the  paper  we  show  how  our  results  extend 
to  this  case. 


2.    THE  MODEL 

The  long-run  player,  player  1,  faces  an  infinite  sequence  of  different 
short-lived  player  2's.   Each  period,  starting  with  period  0,  player  1  selects 
an  action  from  his  action  set  A, ,  while  that  period's  player  2  selects  an 
action  from  A„ .   We  assume  that  players  1  and  2  move  simultaneously  in  each 
period  and  that  the  A.  are  finite  sets;  our  earlier  paper  provided  extensions 
of  both  of  these  assumptions.   However,  in  that  paper  we  assumed  that  the 
short-run  players  observed  player  I's  choice  of  actions,  and  we  restricted 
attention  to  reputations  for  playing  pure  strategies.   In  this  paper  we  will 
assume  that  the  short-run  player's  payoffs  depend  not  on  player  I's  choice  of 
action  a,  ,  but  rather  on  a  stochastic   "outcome"  y..  which  is  drawn  from  a 
finite  set  Y,  with  distribution  ;>(y-|  |  a,  )  .   Corresponding  to  the  action  spaces 
A.  are  the  spaces  2.  of  mixed  strategies;  when  player  I's  mixed  action  is  a, 
the  resulting  distribution  on  y,  is 

Y^   a^(a^)  -pCy^la^). 

(Note  that  this  formulation  includes  the  special  case  where  A  and  Y  are 
isomorphic.)    We  denote  the  distribution  over  outcomes  corresponding  to 
strategy  a..   by  7..  -  p  o  a,  .   Since  it  is  unimportant  whether  or  not  the 
short-run  players'  actions  are  observable,  for  simplicity  we  will  assume  they 
are,  and  identify  the  space  A„  with  a  space  Y„  of  outcomes  of  player  2's  play. 
The  short-run  players  all  have  the  same  expected  utility  function 

Uj :    Y^  x  Y2  -^  K  . 

In  an  abuse  of  notation,  we  let  u„(a)  -  u„(a..,a„)  denote  the  expected  payoff 
corresponding  to  the  mixed  strategy  a  e  2.  Each  period's  short-run  player  acts 
to  maximize  that  period's  payoff. 


Both  players  know  the  short-run  player's  payoff  function.   On  the  other 
hand,  player  1  knows  his  own  payoff  function,  but  the  short-run  players  do 
not.   We  represent  their  uncertainty  about  player  I's  payoffs  using  Harsanyi's 
[1967]  notion  of  a  game  of  incomplete  information.   Player  I's  payoff  is 
identified  with  this  "type"  u  e  CI,    where  n  is  a  countable  set.   It  is  common 
knowledge  that  the  short-run  players  have  (identical)  prior  beliefs  >i  about  w, 
represented  by  a  probability  measure  on  f!. 

Let  H  -  (Y)   be  the  measure  space  of  all  infinite  histories  of  outcomes, 
and  let  K  be  the  corresponding  space  of  probability  measures  on  H.   Player  I's 
payoff  u,  (A,oj)  as  depends  on  the  distribution  h   and  his  type  w.   In 
particular,  for  some  to,    u,  may  not  be  additively  separable  over  time,  and  need 
not  be  an  expected  utility  function. 

Both  long-run  and  short-run  players  can  observe  and  condition  their  play 
at  time  t  on  the  entire  past  history  of  the  realized  outcomes  of  both  players, 
but  not  on  their  choice  of  mixed  strategy.   (In  the  case  where  Y,s;  A, ,  the 
realized  outcome  will  reveal  player  I's  choice  of  action,  but  not  his  choice 
of  mixed  strategy.)   If  H^  denotes  the  set  of  possible  histories  (sequences  of 
outcomes)  through  time  t,  then  a  strategy  for  the  period-t  player  2  is  a  map 
a„:      H  ^^  -^  Y.„ .      Since  player  1  knows  his  type,  a  strategy  for  player  1  is  a 
sequence  of  maps  a  •   H   ,  x  C  -^  Z,  ,  specifying  his  play  as  a  function  of 
history  and  his  type. 

We  denote  this  game  G(fi,/i)  to  emphasize  that  it  depends  on  the  long-run 
player's  discount  factor  and  on  the  beliefs  of  the  short-run  players. 

3.    THE  THEOREM 

Let  B:   S,  ->  i;„  be  the  correspondence  that  maps  mixed  strategies  by 
player  1  to  the  best  responses  of  player  2  (using  the  payoff  u„) .   Because  the 
short-run  players  play  only  once,  in  any  equilibrium  of  G(5,/i),  each  period's 
play  by  the  short-run  player  must  lie  in  the  graph  of  B.   The  short-run 


players'  behavior  can  also  be  characterized  by  how  they  respond  to 
distributions  over  outcomes.   Letting  F,  be  the  space  of  probability 
distributions  over  Y, ,  we  denote  this  correspondence  by  fi:    F,  ->  Z^ .  For  each 
strategy  a^    let  i^{a^  )    be  the  "commitment  type"  which  has  "play  o^    forever"  as 
its  strictly  dominant  strategy  for  the  repeated  game,  and  let 
P..(n,/i)  ■=  (a)en|  w=(j){a,)    for  some  o,    and  ^l{u>)    >   0]    be  the  set  of  commitment 
types  which  have  positive  prior  probability.   In  this  section  we  assume  that 
the  set  of  types  0  is  countable  and  that  the  set  P.,  is  non-empty.   Given  that 
the  set  of  strategies  E,  is  uncountable,  it  might  be  more  appealing  to 
consider  a  density  over  the  set  of  commitment  strategies,  so   that  no  single 
commitment  type  has  positive  prior  probability.    We  consider  this  extension 
below. 

Now  fix  a  type  w„  whose  preferences  correspond  to  the  expected  discounted 
value  of  per-period  payoffs: 


t-0 


where  v  :   A,  x  A„  -»  R .   Given  the  set  of  commitment  types  P, ,  which 
corresponds  to  reputations  the  long-run  player  might  be  able  to  maintain,  we 
ask  which  reputation  would  be  most  desirable.   Define 


(1)        V,  (P..)  -   max     min    v,  (cr,  ,a^), 


and  let  f^nCPn)  satisfy 


We  call   "^i(^i)  ^^^  ^yP^"^n  commitnient  payoff  relative  to  the  set  P,  .   Since 


we  will  hold  P,  fixed  throughout  the  paper,  we  will  simplify  this  to  v  ■  the 

dependence  of  what  follows  on  P,  should  be  clear.   Let  a,    —   cr,  (P-i  )  denote  the 

*        *  .... 

type-a)(-,  commitment  action,  and  7,  -pop   the  commitment  distribution. 

(Note  that  there  may  be  several  commitment  actions.)   Finally,  let  to    (-o)  (P,)) 
be  a  type  such  that  such  a  player  I's  best  strategy  in  the  repeated  game  is  to 

play  £7,  (P,  )  in  every  period.   Nash  equilibrium  requires  that  if  w  has 

t  - 1     "*■     ^ 
positive  probability,  then  o,       (h  ,w  )  -  a..   for  all  t  and  almost  all  h  .   We 

will  say  that  type  u>      is  "the"  commitment  type.   Our  goal  is  to  argue  that 

with  the  "right"  kind  of  incomplete- information,  type  ^n' s   worst  Nash 

equilibrium  payoff  is  close  to   v..    when  S      is  close  to  one. 

Since  the  game  has  countably  many  types  and  periods,  and  finitely  many 

actions  per  type  and  period,  the  set  of  Nash  equilibria  is  a  closed  non-empty 

set.   This  follows  from  the  standard  results  on  the  existence  of  m.ixed 

strategy  equilibria  in  finite  games,  and  the  limiting  results  of  Fudenberg  and 

Levine  [1983,  1985].   Consequently,  if  ^i(a>„)  >  0,  we  may  define  V^(5,/j)  to 

be  w^  player  I's  least  payoff  in  any  Nash  equilibrium  of  the  game  G(5,/j). 


St       St 

Theorem  1:      Assume  ^i(u>r.)   >   0,  and  that  ^1(05  )  =  /;  >  0.   Then  for  all  a  >  0, 
there  is  a  5  <  1  such  that  for  all  6   e    (6,1) 


YQ(5,/i)  >  (1-a)  v^  +   a   min  v^ , 


where  min  v..  is  the  minimum  over  A.,  x  A„ .   This  says  that  if  type  a>^  is 
patient  relative  to  the  prior  probability  fi      that  he  is  "tough",  then  he  can 
achieve  almost  his  commitment  payoff.   Moreover,  the  lower  bound  on  type  ^^'s 
player's  payoff  is  independent  of  the  preferences  of  the  other  types  in  n   to 
which  fi   assigns  positive  probability.   The  condition  p(W(~.)  >  0  is  necessary 
only  for  V_  to  be  well  defined. 


Proof:   We  fix  an  equilibrium  (o,  ,  a„)  of  G(i5,^),  and  consider  the  strategy 
for  player  1  of  always  playing  a  .  The  next  section  of  the  paper  is  devoted  tc 
showing  that  for  any  «  >  0,  there  exists  a  K(^  ,q)  otherwise  independent  of  8 


and  n,    such  that  player  2's  equilibrium  strategy  chooses  actions  outside  of 

•k  -k 

B((7-,  )  more  than  K(/i  ,e)    times  with  probability  no  more  than  t.   If  we  choose 

1  * 

-  q/2  and  S    sufficiently  large  that  <5^^^  '^'   >    (1-q)/(1-0,  then  (since  /j  > 


0)  type  Wj-,  gets  at  least  (I-q)v^  +  q  min  v^ .   Consequently  he  gets  at  least 
this  much  in  equilibrium.  ■ 

4.    BAYESIAN  INFERENCE.  SHORT -RUN  BEST  RESPONSES.  AND  ACTIVE  SUPERMARTINGALES 
This  section  shows  that  if  player  1  plays  strategy  o,    in  every  period,  it 

•k 

is  very  likely  that  the  player  2's  will  choose  actions  in  B(a,)  in  all   but  a 
small  number  of  periods.   The  key  to  the  proof  is  a  strengthening  of  the  fact 
that,  when  player  1  plays  strategy  a^  ,    the  likelihood  ratio  corresponding  to 
player  1  not  being  type  u      is  a  positive  supermartingale .   Ve  strengthen  this 
by  observing  that  in  the  periods  when  player  2  does  not  play  a  best  response 

•v: 

to  o, ,    the  odds  ratio  is  an  "active  supermartingale"  in  the  sense  of  having  a 
non-negligible  probability  of  jumping  a  non-negligible  amount.   While  positive 
supermartingales  can  converge  to  positive  limits,  or  decrease  towards  zero 
very  slowly,  the  Appendix  shows  that  positive  supermartingales  which  are 
active  in  our  sense  converge  to  zero  at  a  uniform  rate  that  depends  only  on 
their  inital  value  and  their  degree  of  "activity." 

Let  h  e  H  be  identified  with  the  subset  of  histories  h  G  H  that 
coincide  with  h   through  and  including  period  t.   In  this  way  H  may  be  viewed 
as  a  subset  of  H.   Type  w^  wishes  to  calculate  his  payoffs  if  he  plays  a,  in 

every  period,  given  the  equilibrium  (a,  ,a„).   Thus  he  should  use  the  measure 

*  "t 
over  H  defined  by  (a,  ,cr„).   Subsequently,  we  use  this  measure  and  H  as  our 

underlying  sample  space.   However,  the  player  2's  use  player  I's  equilibrium 

A 

strategy  a,  (h  ,a))  in  computing  their  beliefs  and  optimal  responses.   Let 

8 


;i(a>|h   .)  be  the  conditional  probability  distribution  over  player  I's  types 
obtained  by  updating  the  prior  ;i(u))  in  accordance  with  Bayes  law  and  the 
equilibrium  strategy  a,  . 

Number  the  outcomes  in  Y,  from  1  to  n,  and  let  p,   be  the  probability 

•k  "k 

that  the  commitment  distribution  7,   -  p  o  a,  assigns  to  outcome  k.   Let 

A 

q(h   ,)  be  the  distribution  over  time-t  outcomes  predicted  by  a   conditional  on 

■k 
the  history  being  h   ^  and  u)  e  n„  =  n/u>    : 


A  . 

q(h^.^)  -   [  p(a>|h^._^)  poaJ(h^_^,u;))  /  (l-p(u)  |h^_^)). 

Let  d(7,  ,7,')  -  sup  IIt-i  (y^,)  "7n '  (yv)  II   ^s  '^he  distance  in  the  sup  norm  between 
k 

distributions  7,  and  7  '  ,  and  set  A(h   ^)  -  d(q(h   ,),7i).   Also,  define 
i^(h   ,)  -  (l-/i(a;  |h   T))q(h   ^)+  /i(cj  |h^  ,)p.   This  is  the  probability 
distribution  over  outcomes  that  player  2  expects  to  face  in  period  t.   Our 
first  claim  is  that  if  the  equilibrium  distribution  q(h   ,)  is  close  to  the 


commitment  distribution  p  in  the  sense  that  A  is  small,  then  u^   gets  at  least 
* 
1 


^0 
* 
V,  in  period  t. 


Lemma  1:   There  is  a  number  A„>0   such  that  if  h   ,  has  positive  probability 
and  A(h   ,)  <  Aq  ,  then  the  equilibrium  strategy  of  player  2  gives  type  l;^   of 


player  1  at  least  v^  at  time  t. 


Proof:   Observe  that  d(i/(h   ,),  7,)  <  A(h   ,),  and  that  v,  is  defined  relative 

to  the  best  response  to  7^  that  type  a)„  likes  the  least.   Since  player  2's 

•k 
best  response  correspondence  B  is  upper  hemi -  continuous ,  for  u    close  to  7,  we 

know  that  each  element  of  B(i^)  must  be  close  to  an  element  of  6(7,).   Now 

since  player  2  has  a  finite  number  of  pure  strategies,  a  strategy  a„  can  be 

close  to  an  element  of  B(7..)  only  if  it  places  probability  close  to  one  on 


pure  strategies  in  the  support  of  6(7. ) .   And  since  player  two  must  be 
indifferent  between  all  strategies  he  is  willing  to  assign  positive 
probability,  we  conclude  that  the  support  of  B(f)  must  be  contained  in  the 
support  of  B(7,)  for  1/  sufficiently  close  to  7,.   The  conclusion  of  the  lemma 
follows  immediately.  ■ 

A  A  A 

Define  families  of  random  variables  (p^(h),q  (h))  by  setting  p  -p,  and 
q  -=q,  (h   ,)  if  y,  occurs  at  time  t.  Define  another  family  of  random  variables 
L  (h)  as  follows:   For  t-0  set 

L^Ch)  .  ^-^    . 
^(w  ) 

Then  let  h  e  H^  be  the  finite  history  that  coincides  with  h  through  and 

u    t 

including  time  t  and  define  recursively 

L^(h)  -  -^ L^   (h)  . 

P_(h) 

It  is  well  knov-Ti  that  L^(h)=  [l-/j(a)  |h^)]//i(a;  |h  )  is  player  2's  posterior 
odds  ratio  that  player  1  is  not  type  u>    .       It  is  also  well  known  that  this  odds 
ratio  is  a  supermartingale .   Ve  give  a  proof  for  completeness: 

Lemma  2:   L^(h)  ■=  [l-p(tj  |h^)]//j(cL>  |h  )  and  (L   H  )  is  a  supermartingale. 


Proof  :   The  first  claim  is  true  for  L„ .   Imagine  it  is  true  for  L   , ,  then 
[l-/i(u,*|h^)]//.(u>*|h^)  -  q^(h)[l-p(a.*|h^_^)]/[p^/i(u>*|h^_^)] 


/\  A 


-  (VPt>  ^-1  -  \' 

To  see  that  L   is  a  supermartingale,  observe  that 
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kesupp(p) 

k6supp(p) 

Lemma  3:       If  h   ^  has  positive  probability  and  L   ,  <  A^,  then  the  equilibrium 
strategy  of  player  2  gives  type  w^,  at  least  v,  (with  probability  one)  at  h  . 

Proof:   If  L   ,  <  A„  then  l-p(w  |h   ,)  <  p(w  |h   ,)A„  <  A„ .   Since 

d(i^(h  ^^),p)  <  l-/j(a;  [h   ,)  <  A^ ,  the  conclusion  follows  from  Lemma  1.    ■ 

Lemma  4:   If  A(h   t)  >  A-  then  Pr[L  /L   -,  -1  ^  '^q/"  I  ^r  1  ^  -  ^0^"^   almost 
surely. 

Proof:   Note  first  that  L  /L^  ,  -  Q^/P^i  vhich  is  q,  (h   ,  )/p-,  with  probability 
P-,  ;  q^Ch^  1 -^/P?  ^'^^'^  probability  p^ ,  and  so  forth  for  those  indices  k  for 
which  p,  ^  0.   Consequently,  it  suffices  to  show  for  some  k 

^k^^t-i^/Pk  -  ^  ■  V"  ^"'^  Pk  -  V"- 

Suppose,  without  loss  of  generality,  that 

^(h^_^)  =  II  p^  -  q;L(h^_^)  I  >  Aq.   If  p^  -  q-LCh..;^)  >  Aq  then  p^  >  A^ ,  and 

1  -  "^i '^^t-l^'^Pl  ~  ^O'^Pl  ~  ^0'  °"^  ^'®  ^^^  done.   If,  on  the  other  hand, 

'^i^^C-l''  "  Pi  -  ^0'  ^^^"  \>1  *^Pk  "  '^k'^'^t-1''''  -  ^0-   Consequently 
n  max^^  (p^  -  qj.(h^  ,)  >  Aq  ,  and,  for  k=2  say,  we  have 

P2  "  ^2^^t-l^  ~  ^O'^^'   ^S^iii.  'we  conclude  p„  >  Ap,/n  and 

•'■  "  ^2*^^t-l^/P2  -  ^q/^-  ■ 
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Lemma  4  shows  that  in  the  periods  where  the  marginal  distribution  on  the 
actions  of  the  non- commitment  types   differs  significantly  from  the  commitment 
distribution,  the  likelihood  ratio  is  likely  to  jump  down  by  a  significant 
amount.   Of  course,  in  periods  where  A(h   ,)  is  small,  the  likelihood  ratio 
need  not  change  much,  but  in  these  periods  we  know  from  Lemma  1  that  player 
two  will  play  a  best  response  to  the  commitment  strategy,  and  so  the  payoff  of 
type  u)^   of  player  one  is  at  least  v..  .   The  key  to  our  result  is  to  show  that 
with  high  probability  there  are  few  periods  where  player  one's  payoff  is  less 
than  v..  ,  that  is,  few  periods  where  A(h   ,  )  >  A„ .   We  will  call  these  bad 
periods.   To  show  that  there  are  unlikely  to  be  many  bad  periods,  we  introduce 
a  new  supermartingale  which  includes  all  of  the  bad  periods  from  the 
supermartingale  L. 

We  first  define  a  sequence  of  stopping  times.   Set  r^  -  0.   If  r,  .,  (h)  - 
",  set  '■i.(h)  -=  "  as  well.   If  r,  -,  (h)  is  finite,  set  r,  (h)  to  be  the  first 
time  t  >  r,  , (h)  such  that  either 

(1)  p^  [  II  vS-1  -  ^  i  <  V"^  -  V""'    ""^ 

(2)  LVL     -  1  >  A„/2n,   or 

"   k-1        ^ 

(3)  if  no  such  time  exists,  set  '"v-C^)  -  "• 

Lemma  4  shows  that  this  sequence  of  stopping  times  picks  out  all  the  bad 
date-history  pairs,  that  is,  those  for  which  A(h^  ,)  >  A„ . 

The  faster  process  L,_  is  defined  by  L  =  L   .   Since  the  t,  are  stopping 

times,  L,  is  a  supermartingale,  with  an  associated  filtration  whose  events  we 
denote  h,  .  Moreover,  we  will  show  that  L,  is  an  "active"  supermartingale  in 
the  following  sense: 


12 


Definition:   A  process  L  with  nonnegative  values  is  an  active  supermartingale 
with  activity  V  if 

Pr[||  \^^/\   -  1  i  >  ^  |h^  )  >  V 
for  all  histories  h   such  that  L  >  0. 

Lemma  5:   I.  is  an  active  supermartingale  with  activity  A„/2n. 

Proof:   Since  the  r,  are  stopping  times,  L,  is  a  supermartingale.   Next,  we 
claim  that  if  h  is  such  that  L  ..  >  0, 

Pr[||  Vhc-1  -  ^11  >  V^"  I  \-l^    >  V"- 

To  see  this,  let  s   -   t,     , (h) ,  which  is  a  constant  with  respect  to  h,  ,; 
r,  (h,  ,  )  is  a  random  variable.   We  will  show  that 

Pr[l|L^  /L^  -  111  >  AQ/2n  |  h^]  >  A^/n. 

One  of  the  three  rules  in  the  definition  of  the  t's  must  be  used  to  choose  r,  . 

k 

We  will  show  that  this  inequality  holds  conditional  on  each  rule,  and  thus 
that  it  holds  averaging  over  all  of  them.   Conditional  on  h  ,  if  rule  (1)  or 
(3)  is  used,  then  with  probability  one  ||  L  /L   -  1  ||  >  A„/2n.   If  rule  2  is 

Tj.   s  u 

used, 

Pr[L^yL^^  _^^  -  1  <  -A^/n  |  h^ ,  (rule  2  used)]  >  A^/n, 


k   ^'k 

and  also  since  rule  1  was  not  used  at  t,  -  1, 

k 


Combining  the  last  two  inequalities  shows  that 
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Pr[L^  /L^  -  1  <  (-Aq/h  +  AQ/2n  -  AQ/2n^)  |  h^ ,  (rule  2)]  >  A^/n. 
k 

2   2 
Since  (-A^/n  +  A^/2n  -  AQ/2n  )  <  -A„/2n,  we  conclude  that 

Pr[l|L^  /L^  -  l|i  >  AQ/2n  |  h^ ,  (rule  2)]  >  A^/n.     ■ 
k 


Next  we  state  a  key  part  of  the  proof:   active  supermartingales  converge 
to  zero  at  a  uniform  rate  that  depends  only  on  their  initial  value  and  their 
degree  of  activity.  ' 

Theorem  A.l:   Let  t^   >   0 ,    tp  e    (0,1),  and  £  >  0  be  given.   For  each  0  <  L  <  £„ , 
and  each  £  >  0,  there  is  a  time  K  <  »  such  that 

P^f="Pk>K  in,  ^  L]  >  1  -  £ 

for  every  active  supermartingale  L  with  L„  -  i^   and  activity  >/). 

This  theorem  is  proved  in  the  Appendix  using  results  about  upcrossing  numbers. 
The  key  aspect  of  the  Theorem  is  that  the  bound  K  depends  only  on  f„  and  V. 
and  is  independent  of  the  particular  supermartingale  chosen. 

To  conclude,  we  recapitulate  the  proof  of  Theorem  1.   From  Lemmas  3  and  4 

we  know  that  for  all  histories  where  player  1  has  always  played  a,  ,  type   a)„ 

* 

receives  at  least  v,  in  all  periods  t  except  possibly  those  where  t  -t,  (h)  for 

—  it:  "k 

some  k.   If  we  set  L„  -  (l-^^    )/p.      and  L  =(1-A„)/A|-^  in  Theorem  A.l,  we  see  that 
for  all  £  >0  there  is  a  K(^  ,£)  such  that  with  probability  at  least  (l-£)  type 


a)Q  receives  v,  in  all  but  K(;i  ,€)  periods.   This  implies  that 


Yo 


(5,(n,;.))  >  (l-£)5^^^  '^^v*  +  [l-(l-£)5^^^*''^]  min  v^ , 


Thus  for  any  q  >£ ,  by  setting  5  large  enough  that  5  ^^ ' ^ ^>(l-a)/(l- £ )  we  he 
that  Yq(5,/j)  >(l-a)v..  +  o.   min  v 
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5.    CONTI^^JUM  OF  COMMITMENT  TYPES 

The  results  above  treat  the  case  where  the  set  n  of  comniitment  types  is 
countable.   In  this  Section  we  consider  the  case  where  0  includes  a  single 
"sane"  type  a)„  which  has  positive  prior  probability,  i.e.  a'('^a)  >  0.  snd  a 
continuum  of  commitment  types"  co(cr^)    corresponding  to  each  of  player  I's  mixed 
strategies  a-.e.   Z,  ,  (and  possibly  other  types  as  well).   The  probability 
distribution  over  commitment  types  is  given  by  a  continuous  density  d^(w) . 

For  each  strategy  a^,    define  the  sane  type  's  corresponding  commitment 
payoff: 

v*(a^)  -   min     j^ia^.a^-.o.^)  . 

We  will  prove  that  type  w„  can  approximate  the  commitment  payoff  to  any  a,  if 
the  discount  factor  6    is  sufficiently  close  to  one. 

Theorem  2    :   Assume  /j(l;_)  >  0,  and  that  dp  is  uniformly  bounded  below  by  r;>0 
over  all  of  the  commtiment  types  a)(a,  )  .   Then  for  all  a,  ,  and  all  q  >  0,  there 
is  a  5  <  1  such  that  for  all  S   6  (5,1),  t\-pe  w„'s  payoff  is  at  least 

(1-q)  v..  (a,  )  +  Q  min  v, 

in  any  Nash  equilibrium  of  G(5,/i). 

To  prove  the  theorem,  fix  a  Nash  equilibrium  (c^,  a^)  ,    and  a  cr-iG^l,  with 
7,  ■=  poa,.      For  each  £>0,  let  N  be  the  e -neighborhood  of  a,    in  the  supremeum 
norm.   As  before,  let  u(h      .,  )  be  the  distribution  over  outcomes  im.plied  by 

A.  A 

(a,,a„)  when  the  history  is  h   , ,  let  q(h   ,)  be  the  probability  distribution 
over  outcomes  at  h   ,  conditional  on  u>   being  in  the  complement  of  N  . 
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Lemina  6  :   There  is  an  «  >  0  such  that  if  e<t  then  b(a')    Q   B(a,)  for  all 
a'  e  N  .  Fixing  £<e ,  there  is  a  A„>0  such  that  if  h   ,  has  positive 
probability  and  d(q(h   ,),  7, )<A„  then  player  2  will  play  a  best  response  to 


a,    at  time  t. 


Proof:   Essentially  the  same  as  Lemma  1:  the  keys  are  the  upper  hemicontinuity 
of  B  and  the  assumption  that  player  two  has  only  finitely  many  pure 
strategies.  ■ 

For  each  history  h   ,  with  positive  probability,  let  6^l(u>(a,)  \h      ^)    be 
the  conditonal  distribution  over  commitment  types  derived  from  the  equilibrium 
strategies.   In  the  proof  of  Theorem  1,  we  considered  the  strategy  for  type  w^ 
of  always  playing  a  fixed  mixed  strategy  a,  .      In  the  present  case  it  will  be 
more  convenient  to  consider  a  slightly  more  complicated  strategy  for  type  w„ . 
Specifically,  define  a  history-dependent  sequence  of  distributions  p  on  the 
outcome  space  Y,  by 


P^^-l^ 


(poc-^)  du(io(c7^)\h^^^) 


dp(a.(a^)|h^_^^) 


a, els 

1    £ 


1    £ 


Define  a  family  of  random,  variables  (p  (h),q^(h))  by  setting  p  ^p,  (h   ,) 
and  q  =q,  (h   , )  if  y,  occurs  at  time  t,  and  define  a  second  family  L^  as 
follows;  In  period  0, 


L^ih)   - 


dn{u{a^)) 


a^eN 


dp(u)(a^)) 


a,eN 


Then  define  recursively 
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q^(h) 

L^(h)  -  -^—  L    (h). 

P,(h) 

This  is  the  likelihood  ratio  for  player  1  not  being  a  type  in  N  .   The  key 
change  required  to  our  earlier  proof  is  that  if  player  1  adopted  the  strategy 
of  always  playing  a,  the  likelihood  ratio   would  not  be  a  supermartingale ,  as 
its  behavior  at  each  date  would  depend  on  the  relative  weights  given  to  types 
u)  in  N  .   However,  if  player  1  adopts  the  strategy  a,    defined  by 


~t 


o^    dn(L>ia^))  |hj._^) 


d/i(c.(a  p|h  ^^l, 


1       €  It 


that  is ,  if  he  plays  to  mimic  the  average  expected  play  of  types  in  N  ,  then 
it  is  easy  to  see  that  L   is  a  supermartingale. 


-t, 


Lemma  2  :   If  player  1  adopts  the  strategy  of  playing  a,  (h   , )  at  each  time  t, 
then  (L  ,  h  )  is  a  supermartingale  and 


L.(h)  - 


1- 


d;i(c^(a^))lh^_^) 


1  e 


dfi(<^(a^))\h^_^J 


a,6N 

1    £ 


Proof  :   Same  as  Lemma  2. 


From  here  on  we  can  follow  the  proofs  of  Lemmas  3  through  5,  replacing  the 
event  ll>~w     with  u)€N   and  replacing  the  stragegy  a,  with  a,  .   We  then  extract 
the  faster  process  L,  which  picks  out  the  "bad"  periods  according  to 
conditions  1  through  3  on  page  15,  and  apply  Theorem  A.l  to  conclude  that 
there  are  unlikely  to  be  many  bad  periods.   Since  player  1  receives  at  least 
"^1^"^!^  in  the  good  periods  by  Lemma  6,  the  conclusion  of  Theorem  2  follows. 
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APPENDIX:   ACTIVE  SUPERMARTINGALES 

Our  goal  is  to  prove 
Theorem  A.l:   Let  £„  >  0 ,  «  >  0,  and  V  G(0,1)  be  given.   For  each  L,  0  <  L  < 
£„ ,  there  is  a  time  K  <  «  such  that  Pr(supj^„  L  <  L)  >  l-e    for  every  active 
supermartingale  L  with  L„  -  l^   and  activity  V-  ■ 

For  a  given  martingale  the  above  is  a  simple  consequence  of  the  fact  that 
L  converges  to  zero  with  probability  one.   The  force  of  the  theorem  is  to  give 
a  uniform  bound  on  the  rate  of  convergence  for  all  supermartingales  with  a 
given  activity  ^p   and  initial  value  £„ . 

Throughout  the  appendix,  we  use  L  to  denote  any  supermartingale  that 
satisfies  the  hypotheses  of  Theorem  A.l.  To  prove  the  theorem,  we  will  use 
some  fundamental  results  from  the  theory  of  supermartingales ,  in  particular 
bounds  on  the  "upcrossing  numbers"  which  we  introduce  below.   These  results 
can  be  found  in  Neveu  [1975],  Chapter  II. 

Fact  A.l:      For  any  supermartingale,  Pr[sup,  „L.  >  c]  <  min  (1,  L^/c) . 

Next,  fix  an  interval  [a,b],  0  <  a  <  b  <  «,  and  define  U,  (a,b)  to  be  the 
number  of  "upcrossings"  of  [a,b]  up  to  time  k;  let  U  (a,b)  be  the  total  number 
of  upcrossings  (possibly  equal  to  =) . 


Fact  A.  3:   Pr[U^(a,b)  >  N]  <  (a/b)^'  min  (L^/a,  1)  . 


This  is  known  as  Dubin's  inequality.   (See,  for  example,  Neveu  [1975],  p.  27) 

Next  we  observe  that  since  L  has  activity  V'l  it  makes  a  jump  of  size  Tp 
with  probability  at  least  \l>   in  each  period  k  where  L,  is  nonzero. 
Consequently,  over  a  large  number  of  periods  either  L  has  jumped  to  zero  or 
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there  are  likely  to  be  "many"  jumps.   Specifically,  define  J,  to  be  the  number 
of  times  k'  <  k  that  11^^ +i/L,  ,  -l||>V'- 

Lemma  A. 4:   For  all  e  and  J  there  exists  a  K  such  that 
Pr[(Jj,>J)  or  (Lj,-0)]  >  V- 

Proof:   Because  L  has  activity  tI>,    in  each  period  k'  ,  either  h.,    -  0  or  the 

probability  of  a  jump  of  size  V  at  time  k'  exceeds  xp .       Define  a  sequence  of 

indicator  functions  I,  by  I,  -  1  iff  (L--0  or  ||L,/L,  -,-1||  >  Tp]  ,    and  set  S„ 

-  )  I,  .   Each  I,  has  expectation  at  least  Tp ,    so  for  some  K  sufficiently  large, 

k<K 
Prob[Sj^>  J]  >  1-e.   Now  if  S  >  J ,  then  either  W  =  0  for  some  k  <  K,  in  which 

case  L-  0  as  well,  or  there  have  been  at  least  J  jumps  by  time  K.      ■ 

We  have  now  established  that  most  paths  of  L 

(1)  Do  not  exceed  c  for  c  large,  (Fact  A. 2) 

(2)  Make  "few"  upcrossings  of  any  positive  interval  [a,b]  (Fact  A. 3),  and 

(3)  Either  make  "lots  of  jumps"  or  hit  zero.   (Lemjna  A. 4) 

we  will  use  these  three  conditions  to  show  that  for  K  large,  most  paths  remain 
Delow  L  from  K  on.  To  do  so,  we  first  argue  that  most  paths  will  pass  below  c 
by  time  K. 

Divide  the  inter\'al  [c,c]  into  I  equal  subintervals  with  endpoints  e.,  - 
c,...,ej_^^  =  c.   Then  define  the  events 

"1  ^^  "'^^k<K  \^    '^' 

£„  if  at  least  one  of  the  intervals  [e.,e.  ..  ]  is  upcrossed  N  or 
more  times; 

Ej  if  Jj,  <  J  and  L^  >  0 , 
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E,  if  min. 


k<K 


\<c. 


By  judicious  choice  of  c,  I,  K,  N  and  J,  we  will  insure  that  E,  C  E,  U  E„  U  E. 
and  that  Pr(E,),  Pr(E„),  PrCE^)  <  e/3.   This  will  yield  our  preliminary 
conclusion  that 

Pr[minj^j,  Lj^  <  c]  -  Pr(E^)  <  1-e. 

If  we  choose 

c  -  (e/3)£Q 
Fact  A. 2  implies  that 

Pr[maxj^j,  11^  >  L  |  min^^,  I^  <  c]c/L  <  c/3 

giving  us  the  desired  conclusion  that 

Pr[iiiaxj^j^  Lj^  >  L]    <    (1-e)    e/3. 

Turning  first  to  E, ,  ve  can  again  use  Fact  A. 2  to  choose 

c  -  (3/0^0 

and  insure  that  Pr(E-,  )  ■=  Pr(max,^^  L  >  c)  <  e/2.      Note  for  future  reference 

that  this  is  true,  regardless  of  how  we  pick  K. 

In  the  range  above  c,  when  ||L /L  ,  -l||  >  -0.  ]1l,  -  L,  -i  ||  5:  lAc. 
Thus,  if  we  choose 

I  >  2c/cV'  +  1 

and  if  L,  >  {l+ip)c,    then  there  is  at  least  a  V"  chance  of  crossing  one  of  the 
subinter\'als  [e.,e.  .].   On  the  other  hand,  a  path  that  remains  between  c  and 
c  and  has  J  or  more  jumps  across  subintervals  must  cross  at  least  one 
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subincerval  (J-I)/2I  -  1  times.   Consequently  if  we  choose 

(*)        N  <  (J-I)/2I  -  1 

c     c     c 
then  E,  C  E,  U  E„  U  E^  as  required.   In  other  words,  a  path  that  does  not  go 

above  c,  that  does  not  upcross  any  subinterval  in  [c,c]  N  or  more  times,  and 

jumps  K  or  more  times,  must  fall  below  c.   By  Fact  A. 3,  we  know  that  for  any 

given  subinterval,  the  probability  of  N  or  more  upcrossings  is  not  more  than 

Consequently,  the  probability  that  some  subinterval  is  upcrossed  N  or  more 
times  is  no  more  than 

I(1-,A)"^  £q/c. 

To  make  Pr(E„)  <  e/3  we  should  choose 

3U./c£ 
N  > 


log(l-iA) 
This    determines   J   by    (*)    above 

J   -   2I(N+1)    +   I. 

Finally,  choose  K  by  fact  A. 4  to  make  Pr(E^)  <  e/3 
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